Finding Semantically Related Words in Large Corpora

نویسندگان

Pavel Smrz

Pavel Rychlý

چکیده

The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Word Senses over Time Using Linguistic Corpora

Word sense induction is an important method to identify possible meanings of words. Word co-occurrences can group word contexts into semantically related topics. Besides the pure words, temporal information provide another dimension to further investigate the development of the word meanings over time. Large digital corpora of written language, such as those that are held by the CLARIN-D center...

متن کامل

Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity

There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple lan...

متن کامل

A New Measure for Extracting Semantically Related Words

The identification of semantically related terms for a given word is an important problem. A number of statistical approaches have been proposed to address this problem. Most approaches draw their statistics from a large general corpus. In this paper, we propose to use specialized corpora which focus strongly on the individual words of interest. We propose to collect such corpora through target...

متن کامل

How textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs

Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001). Modal auxiliary verbs (e.g. could, might), are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context ...

متن کامل

MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora

In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according to different annotation criteria and objectives. In order to guarantee the quality of the results, we have established a methodology for the development of these corpora. The resulting resources consist of a semantically tagged corpus according to the lexical sample task, and a semantically tagged ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Finding Semantically Related Words in Large Corpora

نویسندگان

چکیده

منابع مشابه

Investigation of Word Senses over Time Using Linguistic Corpora

Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity

A New Measure for Extracting Semantically Related Words

How textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs

MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora

عنوان ژورنال:

اشتراک گذاری